For our final project, we wanted to examine how and whether 311 cases, and specifically mobility-related 311 cases, could be a useful predictor of a city’s high-injury network. We use Philadelphia as our test city, as there is already a comprehensive high-injury network (HIN) defined and released by the city that we were able to easily analyze. High-injury networks are generally defined as streets that pose higher risk for accidents and injuries to occur, across all modes of transportation. Some cities discern high-injury networks for each mode, such as varying HINs for pedestrian or cyclist injuries, but Philadelphia’s is meant to be comprehensive for all modes. According to the City of Philadelphia’s Vision Zero Action Plan for 2025, 80% of injuries occur on just 12% of city streets; Philadelphia’s high-injury network was formed around data on crash-related injuries and fatalities.
Our original idea for this project was to create a high-injury network for a city that doesn’t have one yet (Pittsburgh), and then compare it to qualitative (311) and demographic data to understand the value of citizen reporting and the injury burden on certain groups. However, we found that both defining and analyzing a HIN for a new city would not be achievable within the time frame, and moved to instead applying the project’s initial idea for 311-related and demographic analysis to Philadelphia’s already-established network.
We seek to build upon existing research on 311 reporting. Significant research has been done on 311 reporting in Boston, using a mixed methods approach that is equal parts social science and anthropology. O’Brien et. al. found that “higher attachment with the space was associated with a greater motivation to enforce local social norms and benefit the broader community.” [1] With this framework in place, the use of mobility-related 311 cases in comparison to actual traffic injuries allowed us to focus in on places not only where reporting is low, but where reporting does not reflect actual conditions. We also take a more socioeconomic approach - seeking to connect the fascinating behavioral dynamics studied by O’Brien et. al. and connect it to variables like commute mode, homeownership, and poverty.
We define mobility-related 311 cases as those that relate to street conditions.
df <- tibble(
mobility = c("Abandoned Vehicle",
"Complaint (Streets)",
"Dangerous Sidewalk",
"Line Striping",
"Other (Streets)",
"Right of Way Unit",
"Right of Way",
"Salting",
"Stop Sign Repair",
"Street Defect",
"Street Paving",
"Street Trees",
"Traffic (Other)",
"Traffic Signal Emergency"))
kable(df,
caption="311 Cases Classified as Mobility Cases")
| mobility |
|---|
| Abandoned Vehicle |
| Complaint (Streets) |
| Dangerous Sidewalk |
| Line Striping |
| Other (Streets) |
| Right of Way Unit |
| Right of Way |
| Salting |
| Stop Sign Repair |
| Street Defect |
| Street Paving |
| Street Trees |
| Traffic (Other) |
| Traffic Signal Emergency |
In this study, we explore the correlations between demographic data, 311 cases, and the presence of high-injury networks, and develop predictive models for metrics around these networks.
To examine demographic data, we utilize 2019 American Community Survey 5-Year estimates for census tracts. 311 Data is also pulled from 2019.
#ONLY NEED TO RUN THIS BLOCK IF RE-DOING ACS DATA
get_acs_csv <- function(yr1, variableList, nameList) {
acs1 <- get_acs(geography = "tract",
year = yr1,
variables = variableList,
geometry = TRUE,
state = "PA",
county = "Philadelphia",
output = "wide"
)
acs_full <- acs1 %>% dplyr:: select(!ends_with("M")) #include this line to remove margins of error
names(acs_full) <- c("GEOID","NAME", nameList,"geometry")
#acs_full <- acs_full[order(acs_full$GEOID),]
#write.csv(acs_full, file=paste(deparse(substitute(variableList)), yr1,"-",yr2, ".csv")) #include this line to export a csv
return(acs_full)
}
#use this line to look at acs variables
#variables19_5 <- load_variables(year = 2019,dataset = "acs5")
variableList <- c("B01001_001", #total population
"B01001_002", #total male
"B01001_026", #total female
"B02001_002", #white alone
"B02001_003", #black alone
"B02001_004", #AmIn/AlNative alone
"B02001_005", #Asian alone
"B02001_006", #Native hawaiian/PI alone
"B02001_007", #some other race alone
"B02001_008", #two plus races
"B03003_003", #total Hispanic or Latino
"B05009_001", #under 18 years B05009_001 for 2010, B09001_001 for 2019
"B15001_001", #18 years + B15001_001 for 2010, B09021_001 for 2019
"B11003_001", #families with children under 18
"B08301_001", #workers
"B08301_002", #means of transportation to work: car
"B08301_010", #means of transportation to work: public transit
"B08301_018", #means of transportation to work: bicycle
"B08301_019", #means of transportation to work: walked
"B08301_021", #means of transportation to work: wfh
"B19013_001", #median income
"B17001_002", #Income in the past 12 months below poverty level
"B17001_031", #Income in the past 12 months at or above poverty level
"B25064_001", #median gross rent
"B25003_002", #owner occupied
"B25003_003" #renter occupied
)
nameList <- c("pop",
"male",
"fem",
"white",
"black",
"native_am",
"asian",
"nh_pi",
"other",
"multi",
"hisp_lat",
"Under_18",
"Above_18",
"fam_child",
"workers",
"Car",
"Transit",
"Bicycle",
"Walking",
"WFH",
"mdn_inc",
"blw_pov",
"abv_pov",
"med_rent",
"own",
"rent")
acs_data <- get_acs_csv(2019,variableList, nameList)
# add percentages
acs_data <- acs_data %>%
mutate(PCT_m = male/pop,
PCT_f = fem/pop,
PCT_wht = white/pop,
PCT_blk = black/pop,
PCT_na = native_am/pop,
PCT_asn = asian/pop,
PCT_pi = nh_pi/pop,
PCT_oth = other/pop,
PCT_mlt = multi/pop,
PCT_hsp = hisp_lat/pop,
PCT_fam = Under_18/pop,
PCT_car = Car/workers,
PCT_trn = Transit/workers,
PCT_bik = Bicycle/workers,
PCT_wlk = Walking/workers,
PCT_wfh = WFH/workers,
PCT_pov = blw_pov/pop,
PCT_own = own/pop,
PCT_rent = rent/pop,
maj_wht = if_else(PCT_wht > 0.50, 1, 0),
maj_blk = if_else(PCT_blk > 0.50, 1, 0))
acs_data <- st_transform(acs_data,crs=4326)
#this line writes the data to a shapefile
st_write(acs_data,"./data/acs_bg.shp", driver="ESRI Shapefile", append=FALSE)
Because we need to combine 311 data, which is point data, ACS data, which is formatted in polygons, and HIN and street data, which is made up of lines (that cross many ACS polygons), we had to consider the best way to integrate both forms. We decided to evaluate HINs through street intersections; to transform the street network into polygons, we created polygons that center at and are equidistant between each intersection. After this, we joined 311 point data to these intersection polygons, then joined ACS data by transferring the data from whichever ACS polygon had the largest overlap to each intersection polygon.
There are a few issues with this structure; for one, the intersections polygons created vary greatly in size due to differing block lengths, especially near parks and the Schuylkill river. We attempted to alleviate this by clipping polygons based on the Schuylkill river. Second, ACS data will not be entirely exact, as there may be differing tracts that overlap each intersection polygon. Despite these concerns, this format is fairly accurate and proved to be easier to work with than other methods of data combination.
Figure 1. Intersection Polygons
#ONLY NEED TO RUN THIS BLOCK IF RE-DOING ACS DATA
#setwd("G:/My Drive/GrSchool/CPLN 505 Planning by Numbers/PBN_Final")
all311 <- read_sf("./data/311_2019_allSelected.shp")
m311 <- all311 %>% filter(service_na %in%
c("Abandoned Vehicle",
"Complaint (Streets)",
"Dangerous Sidewalk",
"Line Striping",
"Other (Streets)",
"Right of Way Unit",
"Right of Way",
"Salting",
"Stop Sign Repair",
"Street Defect",
"Street Paving",
"Street Trees",
"Traffic (Other)",
"Traffic Signal Emergency"))
st_write(m311,"./data/m311.shp", driver="Esri Shapefile", append=FALSE)
hin <- read_sf("./data/high_injury_network_2020.shp")
streets <- read_sf("./data/CompleteStreets.shp")
ip <- read_sf("./data/intersection_polygons_clip.shp")
crash <- read_sf("./data/COLLISION_CRASH_2016_2020.shp")
crash_fatal <- crash %>% filter(FATAL_COUN > 0)
crash_injury <- crash %>% filter(INJURY_COU > 0)
ip2 <- ip %>%
mutate(all311 = lengths(st_intersects(.,all311)),
m311 = lengths(st_intersects(.,m311)),
hin_c=lengths(st_intersects(.,hin)),
hin = if_else(hin_c > 0, 1,0),
crsh_c=lengths(st_intersects(.,crash)),
fatal =if_else(lengths(st_intersects(.,crash_fatal)) > 0, 1, 0),
fatal_c =lengths(st_intersects(.,crash_fatal)),
inj = if_else(lengths(st_intersects(.,crash_injury)) > 0, 1,0),
inj_c = lengths(st_intersects(.,crash_injury))
)
st_write(ip2,"./data/ip.shp", driver="ESRI Shapefile", append=FALSE)
#ip <- st_join(ip,streets,join=st_intersects)
#this line joins polygons to acs data, but takes too long so just do this via spatial join in GIS
#ip <- st_join(ip,acs_data,join=st_intersects, largest=TRUE)
#write.csv(ip_join, file="./data/ip_join.csv")
dat <- read_sf("./data/ip_acs_join.shp")
dat <- select(dat, -c(TARGET_FID, Id, Input_FID, OBJECTID, NODE_ID))
dat <- dat %>%
mutate(lm311 = if_else(m311 < 4, 1, 0),
logm311 = log(m311),
gap = if_else(m311<4 & hin==1, 1,0)
)
dat_bg <- read_sf("./data/ip_acs_join_bg.shp")
dat_bg <- select(dat_bg, -c(TARGET_FID, Id, Input_FID, OBJECTID, NODE_ID))
dat_bg <- dat_bg %>%
mutate(lm311 = if_else(m311 < 4, 1, 0),
logm311 = log(m311),
gap = if_else(m311<4 & hin==1, 1,0)
)
#only need these if mapping
# hin <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/hin.geojson")
# streets <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/CompleteStreets.geojson")
# ip <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/intersection_polygons_clip.geojson")
# m311 <- read_sf("https://github.com/zoenyoo/PBN_Final/raw/main/data/m311.geojson")
# ggplot() +
# geom_point(data = m311, aes(x = lon, y = lat), alpha = .05)
#
# ggplot()+
# geom_sf(data=dat, show.legend = NA, color = palette5[5], size=0.25)
We use summary statistics to get a basic idea of three varying cases: intersections with high mobility cases that are in the HIN, intersections with low mobility cases in the HIN, and intersections with high mobility cases that are not in the HIN. Even with these tables, it is clear that there is some variation; Table 4 illustrates averages for a few variables of interest.
dat1 <- dat %>% st_drop_geometry() %>% filter(m311>3 & hin==1) #high mobility cases and in the hin
dat2 <- dat %>% st_drop_geometry() %>% filter(m311<=3 & hin==1) #low mobility cases and in the hin
dat3 <- dat %>% st_drop_geometry() %>% filter(m311>3 & hin==0) #high mobility cases and not in the hin
dat1 %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) %>% kable(caption="Table 1. High Mobility Cases in the HIN") %>%
kable_styling("striped", full_width = F) %>% scroll_box(width = "1000px", height = "150px")
| Join_Count | all311 | m311 | hin_c | hin | crsh_c | fatal | fatal_c | inj | inj_c | pop | male | fem | white | black | native_am | asian | nh_pi | other | multi | hisp_lat | Under_18 | Above_18 | fam_child | workers | Car | Transit | Bicycle | Walking | WFH | mdn_inc | blw_pov | abv_pov | med_rent | own | rent | PCT_m | PCT_f | PCT_wht | PCT_blk | PCT_na | PCT_asn | PCT_pi | PCT_oth | PCT_mlt | PCT_hsp | PCT_fam | PCT_car | PCT_trn | PCT_bik | PCT_wlk | PCT_wfh | PCT_pov | PCT_own | PCT_rent | maj_wht | maj_blk | lm311 | logm311 | gap |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.657815 | 9.101669 | 7.756449 | 1.189681 | 1 | 8.08953 | 0.1062215 | 0.1236722 | 0.8710167 | 6.031866 | 4523.993 | 2122.212 | 2401.781 | 1448.404 | 2167.28 | 21.6305 | 309.5083 | 2.830046 | 437.8998 | 136.4408 | 903.3528 | 991.6055 | 3453.4 | 906.2693 | 1822.61 | 1030.714 | 498.3103 | 31.57132 | 150.758 | 73.12747 | 41563.42 | 1270.546 | 3117.763 | 1029.032 | 841.9256 | 810.7762 | 0.4600813 | 0.5179157 | 0.3126062 | 0.482271 | 0.0045639 | 0.0656056 | 0.0007335 | 0.0825589 | 0.0296578 | 0.1749373 | 0.2008989 | 0.5344043 | 0.2817445 | 0.0162121 | 0.0848734 | 0.0401989 | 0.2702027 | 0.1786143 | 0.1866427 | 0.2488619 | 0.4658574 | 0 | 1.911576 | 0 |
dat2 %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) %>% kable(caption="Table 2. Low Mobility Cases in the HIN") %>%
kable_styling("striped", full_width = F) %>% scroll_box(width = "1000px", height = "150px")
| Join_Count | all311 | m311 | hin_c | hin | crsh_c | fatal | fatal_c | inj | inj_c | pop | male | fem | white | black | native_am | asian | nh_pi | other | multi | hisp_lat | Under_18 | Above_18 | fam_child | workers | Car | Transit | Bicycle | Walking | WFH | mdn_inc | blw_pov | abv_pov | med_rent | own | rent | PCT_m | PCT_f | PCT_wht | PCT_blk | PCT_na | PCT_asn | PCT_pi | PCT_oth | PCT_mlt | PCT_hsp | PCT_fam | PCT_car | PCT_trn | PCT_bik | PCT_wlk | PCT_wfh | PCT_pov | PCT_own | PCT_rent | maj_wht | maj_blk | lm311 | logm311 | gap |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.627287 | 1.929196 | 1.18576 | 1.118536 | 1 | 5.61257 | 0.0799523 | 0.089101 | 0.7983294 | 4.129674 | 4403.953 | 2076.395 | 2327.558 | 1530.323 | 1958.685 | 16.83691 | 325.3341 | 2.604216 | 433.2331 | 136.9375 | 880.1348 | 945.8345 | 3383.691 | 880.1714 | 1783.377 | 1041.418 | 455.3874 | 25.32379 | 154.3926 | 71.55847 | 42014.55 | 1222.102 | 3029.864 | 1019.444 | 798.2693 | 807.4614 | 0.4626325 | 0.5158878 | 0.3401637 | 0.4481873 | 0.0038957 | 0.0704639 | 0.0006076 | 0.084716 | 0.030486 | 0.1773042 | 0.1969008 | 0.5463029 | 0.2688997 | 0.0141351 | 0.0889295 | 0.0402165 | 0.2704805 | 0.1732773 | 0.1918049 | 0.2895784 | 0.4089101 | 1 | -Inf | 1 |
dat3 %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) %>% kable(caption="Table 3. High Mobility Cases NOT in the HIN") %>%
kable_styling("striped", full_width = F) %>% scroll_box(width = "1000px", height = "150px")
| Join_Count | all311 | m311 | hin_c | hin | crsh_c | fatal | fatal_c | inj | inj_c | pop | male | fem | white | black | native_am | asian | nh_pi | other | multi | hisp_lat | Under_18 | Above_18 | fam_child | workers | Car | Transit | Bicycle | Walking | WFH | mdn_inc | blw_pov | abv_pov | med_rent | own | rent | PCT_m | PCT_f | PCT_wht | PCT_blk | PCT_na | PCT_asn | PCT_pi | PCT_oth | PCT_mlt | PCT_hsp | PCT_fam | PCT_car | PCT_trn | PCT_bik | PCT_wlk | PCT_wfh | PCT_pov | PCT_own | PCT_rent | maj_wht | maj_blk | lm311 | logm311 | gap |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.365113 | 9.036145 | 7.46307 | 0 | 0 | 2.041121 | 0.0065479 | 0.0065479 | 0.5301205 | 1.454165 | 4750.637 | 2259.248 | 2491.389 | 1863.155 | 2092.983 | 17.31456 | 302.2412 | 2.163436 | 321.3965 | 151.3837 | 733.5254 | 1021.575 | 3651.174 | 991.3075 | 2060.079 | 1186.894 | 548.9002 | 52.31823 | 149.572 | 86.87428 | 48607.51 | 1148.94 | 3525.859 | 1055.483 | 990.3332 | 792.4419 | 0.4729311 | 0.5220925 | 0.3906827 | 0.4512141 | 0.0035128 | 0.0589335 | 0.0005067 | 0.0586803 | 0.0314934 | 0.1377884 | 0.2037315 | 0.5616817 | 0.2800231 | 0.0227735 | 0.0706095 | 0.0422709 | 0.2371694 | 0.2068883 | 0.1759053 | 0.3863279 | 0.4313777 | 0 | 1.88581 | 0 |
# HIN_dat <- filter(dat, hin == 1)
# mcase_dat <- filter(dat, m311 > 3)
# HIN_d <- density(HIN_dat$PCT_own)
# mcase_d <- density(mcase_dat$PCT_own)
# acs_d <- density(dat$PCT_own, na.rm=TRUE)
# plot(HIN_d, ylim=c(0,7))
# plot(mcase_d, ylim=c(0,7))
# plot(acs_d, ylim=c(0,7))
Table 4.
numericVars <-
select_if(dat %>% st_drop_geometry(), is.numeric) %>%
select(Join_Count, all311, m311, hin, hin_c, crsh_c, fatal, fatal_c, inj, inj_c, pop, starts_with("PCT")) %>%
na.omit()
ggcorrplot( #correlation plot
round(cor(numericVars), 1),
p.mat = cor_pmat(numericVars),
show.diag = TRUE,
lab = TRUE,
colors = c("#08519c", "white", "#FA342A"),
type="lower",
insig = "blank",
) +
labs(title = "Figure 2. Correlations of Demographic Data, HIN Data, and 311 Data",
subtitle = "Intersection Polygons, Philadelphia, PA")
To initially examine correlation, we utilize a correlation plot for all numeric variables. As we can see from the correlation plot, there are many demographic variables that exhibit correlation between each other, but when looking at correlation to the presence of the HIN, there are no strongly correlated variables outside of crash-related variables.
Figure 3. Income
Figure 3 compares the relative densities of high-injury intersections and intersections with a high number of mobility cases with median income (by census tract). The majority of high-injury intersections with a high number of mobility cases are at this peak around $30,000 median income, and after a breaking point of about $50,000, 311 cases become more frequently reported as incomes rise.
Figure 4. Percentage White
Figure 4, which changes the x-axis to percentage of white residents, illustrates two points: one, High-injury intersections are much more concentrated in less-white census tracts. Two, we see that as the percentage of white residents per tract rises, reporting of (mobility-related) 311 cases become relatively more concentrated.
Figure 5. Homeownership
Figure 5 utilizes home ownership as a comparison. High-injury intersections do peak at slightly lower percentage rates of homeownership; additionally, similar to the past two comparisons, reporting of mobility cases becomes more dense than High-Injury Intersections as the rate of homeownership rises.
To pick variables from the dataset that may be useful for our models, we use backward stepwise regression.
#backward stepwise regression
hin_test <- glm(hin ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam + all311 + m311,
data = dat %>% st_drop_geometry(),
family="binomial" (link="logit"))
lm311_test <- glm(lm311 ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam,
data = dat %>% st_drop_geometry(),
family="binomial" (link="logit"))
# check this for na values
gap_test <- glm(lm311 ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht + maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi + PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam,
data = dat %>% st_drop_geometry() %>% filter(hin==1),
family="binomial" (link="logit"))
drop1(hin_test, test="Chisq")
## Single term deletions
##
## Model:
## hin ~ pop + mdn_inc + med_rent + workers + PCT_f + PCT_wht +
## maj_wht + PCT_blk + maj_blk + PCT_asn + PCT_na + PCT_pi +
## PCT_oth + PCT_mlt + PCT_hsp + PCT_car + PCT_trn + PCT_bik +
## PCT_wlk + PCT_wfh + PCT_own + PCT_pov + PCT_fam + all311 +
## m311
## Df Deviance AIC LRT Pr(>Chi)
## <none> 16860 16912
## pop 1 16862 16912 1.738 0.187355
## mdn_inc 1 16866 16916 5.968 0.014566 *
## med_rent 1 16892 16942 32.388 1.263e-08 ***
## workers 1 16860 16910 0.144 0.704068
## PCT_f 1 16865 16915 5.083 0.024168 *
## PCT_wht 1 16860 16910 0.050 0.822303
## maj_wht 1 16860 16910 0.408 0.522956
## PCT_blk 1 16862 16912 1.741 0.186989
## maj_blk 1 16860 16910 0.087 0.768116
## PCT_asn 1 16866 16916 6.309 0.012012 *
## PCT_na 1 16860 16910 0.481 0.487833
## PCT_pi 1 16860 16910 0.487 0.485421
## PCT_oth 1 16877 16927 16.961 3.815e-05 ***
## PCT_mlt 1 16860 16910 0.301 0.583329
## PCT_hsp 1 16861 16911 1.365 0.242691
## PCT_car 1 16863 16913 3.313 0.068716 .
## PCT_trn 1 16863 16913 2.876 0.089918 .
## PCT_bik 1 16891 16941 30.678 3.046e-08 ***
## PCT_wlk 1 16860 16910 0.272 0.602235
## PCT_wfh 1 16867 16917 6.999 0.008157 **
## PCT_own 1 16965 17015 104.899 < 2.2e-16 ***
## PCT_pov 1 16861 16911 0.756 0.384612
## PCT_fam 1 16878 16928 18.159 2.032e-05 ***
## all311 1 16913 16963 52.615 4.059e-13 ***
## m311 1 16937 16987 76.663 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#step(gap_test, direction="backward")
#step(hin_test, direction="backward")
anova(hin_test, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: hin
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 16865 18076
## pop 1 14.82 16864 18061 0.0001181 ***
## mdn_inc 1 338.36 16863 17723 < 2.2e-16 ***
## med_rent 1 154.50 16862 17568 < 2.2e-16 ***
## workers 1 14.05 16861 17554 0.0001785 ***
## PCT_f 1 17.37 16860 17537 3.076e-05 ***
## PCT_wht 1 63.15 16859 17474 1.915e-15 ***
## maj_wht 1 15.89 16858 17458 6.728e-05 ***
## PCT_blk 1 215.96 16857 17242 < 2.2e-16 ***
## maj_blk 1 4.85 16856 17237 0.0276612 *
## PCT_asn 1 0.55 16855 17237 0.4587719
## PCT_na 1 0.00 16854 17237 0.9602461
## PCT_pi 1 0.01 16853 17237 0.9242424
## PCT_oth 1 11.36 16852 17225 0.0007515 ***
## PCT_mlt 1 14.16 16851 17211 0.0001676 ***
## PCT_hsp 1 3.32 16850 17208 0.0682871 .
## PCT_car 1 30.77 16849 17177 2.899e-08 ***
## PCT_trn 1 14.58 16848 17162 0.0001341 ***
## PCT_bik 1 63.87 16847 17098 1.327e-15 ***
## PCT_wlk 1 25.58 16846 17073 4.252e-07 ***
## PCT_wfh 1 3.08 16845 17070 0.0793406 .
## PCT_own 1 104.29 16844 16966 < 2.2e-16 ***
## PCT_pov 1 4.00 16843 16962 0.0453982 *
## PCT_fam 1 17.94 16842 16944 2.284e-05 ***
## all311 1 6.97 16841 16937 0.0082880 **
## m311 1 76.66 16840 16860 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on AIC values, it appears that mobility-related 311 cases do not improve the model particularly; however, we still include them to get an idea of how mobility-related 311 cases are related.
Our first models predict whether an intersection is part of the High-Injury Network.
#our honed model
hinMod1 <- glm(hin ~ med_rent + maj_blk + PCT_car + PCT_bik + PCT_own + m311,
data=dat %>% st_drop_geometry(),
family="binomial" (link="logit"))
#model based on backwards stepwise
hinMod2 <- glm(hin ~ mdn_inc + med_rent + PCT_f + PCT_blk + PCT_asn +
PCT_oth + PCT_hsp + PCT_car + PCT_trn + PCT_bik + PCT_wfh +
PCT_own + PCT_fam + all311 + m311,
data=dat %>% st_drop_geometry(),
family="binomial" (link="logit"))
#model just using crash and injury/fatal data
hinMod3 <- glm(hin ~ crsh_c + inj_c + fatal_c,
data=dat %>% st_drop_geometry(),
family="binomial" (link="logit"))
hinModb <- glm(hin ~ crsh_c + inj_c + fatal_c,
data=dat_bg %>% st_drop_geometry(),
family="binomial" (link="logit"))
stargazer(hinMod1, hinMod2, hinMod3,
type="html", title="Table 5. High-Injury Network Intersections",
column.labels = c("Model 1", "Model 2", "Model 3"),
single.row = TRUE, digits=2)
| Dependent variable: | |||
| hin | |||
| Model 1 | Model 2 | Model 3 | |
| (1) | (2) | (3) | |
| mdn_inc | -0.0000** (0.0000) | ||
| med_rent | 0.0002*** (0.0001) | 0.001*** (0.0001) | |
| maj_blk | 0.11*** (0.04) | ||
| PCT_f | 0.90** (0.37) | ||
| PCT_blk | 1.06*** (0.12) | ||
| PCT_asn | 2.03*** (0.29) | ||
| PCT_oth | 4.62*** (0.60) | ||
| PCT_hsp | -0.62* (0.36) | ||
| PCT_car | -0.43*** (0.13) | -1.38*** (0.21) | |
| PCT_trn | -1.31*** (0.28) | ||
| PCT_bik | -6.76*** (0.77) | -6.64*** (0.81) | |
| PCT_wfh | -2.91*** (0.67) | ||
| PCT_own | -6.22*** (0.32) | -3.95*** (0.37) | |
| PCT_fam | -2.02*** (0.35) | ||
| all311 | -0.09*** (0.01) | ||
| m311 | 0.03*** (0.004) | 0.12*** (0.01) | |
| crsh_c | 0.15*** (0.02) | ||
| inj_c | 0.12*** (0.03) | ||
| fatal_c | 1.57*** (0.14) | ||
| Constant | -0.05 (0.10) | -0.37*** (0.14) | -1.97*** (0.03) |
| Observations | 16,866 | 16,866 | 16,866 |
| Log Likelihood | -8,643.19 | -8,435.43 | -7,487.66 |
| Akaike Inf. Crit. | 17,300.38 | 16,902.86 | 14,983.32 |
| Note: | p<0.1; p<0.05; p<0.01 | ||
data.frame(100*(exp(coef(hinMod1)))-1) %>%
kable()
| X100….exp.coef.hinMod1……1 | |
|---|---|
| (Intercept) | 94.5429368 |
| med_rent | 99.0235609 |
| maj_blk | 110.7776230 |
| PCT_car | 64.2446404 |
| PCT_bik | -0.8844418 |
| PCT_own | -0.8009256 |
| m311 | 101.6840253 |
We develop three models that are shown in Table 4: Model 2 is the model using most variables suggested by backwards stepwise regression, and Model 3 uses just crash-related data. Model 1 is our most refined model using just demographic variables, both binary and continuous, and a dummy variable for mobility-related 311 cases. As shown by the AIC, Model 3 is the most useful model for predicting the HIN, which is clear due to the fact that the Philadelphia HIN is based around crash data. To contrast, Model 2 is concentrated around specific variables of high significance that we were interested in. For example, as Figure 6 illustrates, our model examines the relationship between the percentage of black residents and presence of the High Injury Network. Based on this model, high injury intersections are 110% more likely to be in majority black tracts.
Figure 6. HIN and Majority Black Census Tracts
These models predict for intersections that have mobility case reporting of 3 or fewer.
# our honed model
l311Mod1 <- glm(lm311 ~ med_rent + maj_blk + PCT_car + PCT_bik + PCT_own + PCT_pov,
data=dat %>% st_drop_geometry(),
family="binomial" (link="logit"))
l311Modb <- glm(lm311 ~ med_rent + maj_blk + PCT_car + PCT_bik + PCT_own + PCT_pov,
data=dat_bg %>% st_drop_geometry(),
family="binomial" (link="logit"))
#model based on backwards stepwise
l311Mod2 <- glm(formula = lm311 ~ mdn_inc + med_rent + workers + PCT_f +
PCT_blk + maj_blk + PCT_hsp + PCT_trn + PCT_bik + PCT_wlk +
PCT_wfh + PCT_own + PCT_pov + PCT_fam, family = binomial(link = "logit"),
data = dat %>% st_drop_geometry())
stargazer(l311Mod1, l311Mod2,
type="html", title="Table 6. Low Mobility Case-Reporting Intersections",
column.labels = c("Model 1", "Model 2"),
single.row = TRUE, digits=2)
| Dependent variable: | ||
| lm311 | ||
| Model 1 | Model 2 | |
| (1) | (2) | |
| mdn_inc | 0.0000*** (0.0000) | |
| med_rent | -0.0002*** (0.0001) | -0.0004*** (0.0001) |
| workers | -0.0002*** (0.0000) | |
| PCT_f | 1.99*** (0.34) | |
| PCT_blk | -1.00*** (0.15) | |
| maj_blk | -0.23*** (0.04) | 0.22** (0.10) |
| PCT_car | 0.61*** (0.12) | |
| PCT_hsp | -0.86*** (0.14) | |
| PCT_trn | -1.54*** (0.19) | |
| PCT_bik | -2.30*** (0.57) | -4.43*** (0.58) |
| PCT_wlk | -0.47* (0.26) | |
| PCT_wfh | -1.24** (0.56) | |
| PCT_own | -0.90*** (0.30) | -0.55 (0.34) |
| PCT_pov | -0.84*** (0.15) | 1.14*** (0.25) |
| PCT_fam | -1.72*** (0.34) | |
| Constant | 1.21*** (0.12) | 1.53*** (0.15) |
| Observations | 16,866 | 16,866 |
| Log Likelihood | -10,269.11 | -10,135.47 |
| Akaike Inf. Crit. | 20,552.21 | 20,300.93 |
| Note: | p<0.1; p<0.05; p<0.01 | |
data.frame(100*(exp(coef(l311Mod1)))-1) %>%
kable()
| X100….exp.coef.l311Mod1……1 | |
|---|---|
| (Intercept) | 335.232596 |
| med_rent | 98.979345 |
| maj_blk | 78.360239 |
| PCT_car | 183.300971 |
| PCT_bik | 8.996052 |
| PCT_own | 39.615710 |
| PCT_pov | 42.171354 |
For low mobility-related cases, we include a refined model (Model 1) and a model that includes all variables suggested by backwards stepwise regression. Though the AIC value is higher for Model 1, all variables included are significant highly significant. For our refined model, we found that low mobility-related311 reporting is 78% MORE LIKELY in majority black tracts; both factors are visualized in Figure 7.
Figure 7. Mobility-Related Cases and Majority Black Census Tracts
Comparing models that we have developed based on demographic data to models based on injury variables, we cannot say that demographic data is an accurate predictor of the presence of High-Injury Network streets. However, our models, whether for High-Injury Network presence, low-mobility 311 cases, or gap intersections, do have a level of predictive power. Additionally, there are certainly interesting and clear patterns that become apparent when spatially and graphically examining the demographic distributions alongside the High-Injury Network.
Since Philadelphia’s High-Injury Network is mainly based on vehicle crash data, including injuries and fatalities, we are basing the accuracy of our models on the trustworthiness of the city’s data. The efficiacy of High-Injury Networks depends on their accuracy, but crash data is also often flawed; for instance, bicycle injuries and crashes are vastly underreported. This may signify a need to include wider variables outside of just injury data when developing HINs, or do further investigation to ensure that data is accurate. Furthermore, the fact that several variables, both binary and continuous, in our dataset are highly significant when predicting for the presence of High-Injury Networks indicates the importance of utilizing demographic data and 311 data, both in the development of HINs for other cities or when conducting analysis on where to prioritize transportation infrastructure investments.
City of Philadelphia. (2022a). OpenDataPhilly. OpenDataPhilly. https://www.opendataphilly.org/
City of Philadelphia. (2022b). Vision Zero Philadelphia. Vision Zero. http://visionzerophl.com/
Conkle, A., & West, C. (2008). Psychology on the Road. APS Observer, 21. https://www.psychologicalscience.org/observer/psychology-on-the-road
Coulson, N. E., & Li, H. (2013). Measuring the external benefits of homeownership. Journal of Urban Economics, 77, 57–67. https://doi.org/10.1016/j.jue.2013.03.005
O’Brien, D. T. (2018). The urban commons: How data and technology can rebuild our communities. Harvard University Press.
O’Brien, D. T., Gordon, E., & Baldwin, J. (2014). Caring about the community, counteracting disorder: 311 reports of public issues as expressions of territoriality. Journal of Environmental Psychology, 40, 320–330. https://doi.org/10.1016/j.jenvp.2014.08.003
Pennsylvania Department of Transportation. (2022). Pennsylvania Crash Information Tool. Pennsylvania Department of Transportation. https://crashinfo.penndot.gov/PCIT/welcome.html
U.S. Census Bureau. (2010-2019). American Community Survey 5-year Estimates.